comparative evaluation
Supplementary Material: Knowledge-based in silico models and dataset for the comparative evaluation of mammography AI for a range of breast characteristics, lesion conspicuities and doses
M-SYNTH is organized into a directory structure that indicates the parameters. Code and dataset is released with the Creative Commons 1.0 Universal License We now review the timing required to perform mass insertion and imaging. In Table 2, we review the imaging time required for each breast density. The time varies from 2.84 GPU), we were able to generate the complete dataset in about two weeks.Breast Density Time (min) Fatty 13.463809 Scattered 11.002291 Hetero 3.655613 Dense 2.842028 Table 2: Timing analysis for imaging by breast density. Additional renderings of the breast phantoms generated for the study are shown in Figure 1, demonstrating a high level of detail and anatomical variability within and among models.
- Health & Medicine > Diagnostic Medicine > Imaging (0.51)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.41)
Knowledge-based in silico models and dataset for the comparative evaluation of mammography AI for a range of breast characteristics, lesion conspicuities and doses
To generate evidence regarding the safety and efficacy of artificial intelligence (AI) enabled medical devices, AI models need to be evaluated on a diverse population of patient cases, some of which may not be readily available. We propose an evaluation approach for testing medical imaging AI models that relies on in silico imaging pipelines in which stochastic digital models of human anatomy (in object space) with and without pathology are imaged using a digital replica imaging acquisition system to generate realistic synthetic image datasets. Here, we release M-SYNTH, a dataset of cohorts with four breast fibroglandular density distributions imaged at different exposure levels using Monte Carlo x-ray simulations with the publicly available Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) toolkit. We utilize the synthetic dataset to analyze AI model performance and find that model performance decreases with increasing breast density and increases with higher mass density, as expected. As exposure levels decrease, AI model performance drops with the highest performance achieved at exposure levels lower than the nominal recommended dose for the breast type.
- Health & Medicine > Diagnostic Medicine > Imaging (0.76)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.43)
- North America > United States > Maryland > Montgomery County > Silver Spring (0.04)
- Europe > Portugal (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China (0.04)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Health Care Technology (0.95)
- Government > Regional Government > North America Government > United States Government (0.68)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.53)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Supplementary Material: Knowledge-based in silico models and dataset for the comparative evaluation of mammography AI for a range of breast characteristics, lesion conspicuities and doses
M-SYNTH is organized into a directory structure that indicates the parameters. Code and dataset is released with the Creative Commons 1.0 Universal License We now review the timing required to perform mass insertion and imaging. In Table 2, we review the imaging time required for each breast density. The time varies from 2.84 GPU), we were able to generate the complete dataset in about two weeks.Breast Density Time (min) Fatty 13.463809 Scattered 11.002291 Hetero 3.655613 Dense 2.842028 Table 2: Timing analysis for imaging by breast density. Additional renderings of the breast phantoms generated for the study are shown in Figure 1, demonstrating a high level of detail and anatomical variability within and among models.
- Health & Medicine > Diagnostic Medicine > Imaging (0.51)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.41)
Capabilities of GPT-5 across critical domains: Is it the next breakthrough?
The accelerated evolution of large language models has raised questions about their comparative performance across domains of practical importance. GPT-4 by OpenAI introduced advances in reasoning, multimodality, and task generalization, establishing itself as a valuable tool in education, clinical diagnosis, and academic writing, though it was accompanied by several flaws. Released in August 2025, GPT-5 incorporates a system-of-models architecture designed for task-specific optimization and, based on both anecdotal accounts and emerging evidence from the literature, demonstrates stronger performance than its predecessor in medical contexts. This study provides one of the first systematic comparisons of GPT-4 and GPT-5 using human raters from linguistics and clinical fields. Twenty experts evaluated model-generated outputs across five domains: lesson planning, assignment evaluation, clinical diagnosis, research generation, and ethical reasoning, based on predefined criteria. Mixed-effects models revealed that GPT-5 significantly outperformed GPT-4 in lesson planning, clinical diagnosis, research generation, and ethical reasoning, while both models performed comparably in assignment assessment. The findings highlight the potential of GPT-5 to serve as a context-sensitive and domain-specialized tool, offering tangible benefits for education, clinical practice, and academic research, while also advancing ethical reasoning. These results contribute to one of the earliest empirical evaluations of the evolving capabilities and practical promise of GPT-5.
- North America > United States (0.04)
- Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
- North America > Canada (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (1.00)
- Education (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)
Intelligent Routing for Sparse Demand Forecasting: A Comparative Evaluation of Selection Strategies
Sparse and intermittent demand forecasting in supply chains presents a critical challenge, as frequent zero-demand periods hinder traditional model accuracy and impact inventory management. We propose and evaluate a Model-Router framework that dynamically selects the most suitable forecasting model-spanning classical, ML, and DL methods for each product based on its unique demand pattern. By comparing rule-based, LightGBM, and InceptionTime routers, our approach learns to assign appropriate forecasting strategies, effectively differentiating between smooth, lumpy, or intermittent demand regimes to optimize predictions. Experiments on the large-scale Favorita dataset show our deep learning (Inception Time) router improves forecasting accuracy by up to 11.8% (NWRMSLE) over strong, single-model benchmarks with 4.67x faster inference time. Ultimately, these gains in forecasting precision will drive substantial reductions in both stockouts and wasteful excess inventory, underscoring the critical role of intelligent, adaptive Al in optimizing contemporary supply chain operations.
- North America > Canada > Ontario > Toronto (0.05)
- Oceania > Australia (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (3 more...)
- Information Technology > Modeling & Simulation (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.90)
Comparative Evaluation of Prompting and Fine-Tuning for Applying Large Language Models to Grid-Structured Geospatial Data
Dhruv, Akash, Xie, Yangxinyu, Branham, Jordan, Mallick, Tanwi
This paper presents a comparative study of large language models (LLMs) in interpreting grid-structured geospatial data. We evaluate the performance of a base model through structured prompting and contrast it with a fine-tuned variant trained on a dataset of user-assistant interactions. Our results highlight the strengths and limitations of zero-shot prompting and demonstrate the benefits of fine-tuning for structured geospatial and temporal reasoning.
- North America > United States > Illinois > Cook County > Lemont (0.05)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Asia > Middle East > Jordan (0.04)
- Energy (0.48)
- Government > Regional Government (0.47)
Classification of power quality events in the transmission grid: comparative evaluation of different machine learning models
Güvengir, Umut, Küçük, Dilek, Buhan, Serkan, Mantaş, Cuma Ali, Yeniceli, Murathan
Automatic classification of electric power quality events with respect to their root causes is critical for electrical grid management. In this paper, we present comparative evaluation results of an extensive set of machine learning models for the classification of power quality events, based on their root causes. After extensive experiments using different machine learning libraries, it is observed that the best performing learning models turn out to be Cubic SVM and XGBoost. During error analysis, it is observed that the main source of performance degradation for both models is the classification of ABC faults as ABCG faults, or vice versa. Ultimately, the models achieving the best results will be integrated into the event classification module of a large-scale power quality and grid monitoring system for the Turkish electricity transmission system.
Knowledge-based in silico models and dataset for the comparative evaluation of mammography AI for a range of breast characteristics, lesion conspicuities and doses
To generate evidence regarding the safety and efficacy of artificial intelligence (AI) enabled medical devices, AI models need to be evaluated on a diverse population of patient cases, some of which may not be readily available. We propose an evaluation approach for testing medical imaging AI models that relies on in silico imaging pipelines in which stochastic digital models of human anatomy (in object space) with and without pathology are imaged using a digital replica imaging acquisition system to generate realistic synthetic image datasets. Here, we release M-SYNTH, a dataset of cohorts with four breast fibroglandular density distributions imaged at different exposure levels using Monte Carlo x-ray simulations with the publicly available Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) toolkit. We utilize the synthetic dataset to analyze AI model performance and find that model performance decreases with increasing breast density and increases with higher mass density, as expected. As exposure levels decrease, AI model performance drops with the highest performance achieved at exposure levels lower than the nominal recommended dose for the breast type.
- Health & Medicine > Diagnostic Medicine > Imaging (0.74)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.40)
Energy Price Modelling: A Comparative Evaluation of four Generations of Forecasting Methods
Andrei, Alexandru-Victor, Velev, Georg, Toma, Filip-Mihai, Pele, Daniel Traian, Lessmann, Stefan
Energy is a critical driver of modern economic systems. Accurate energy price forecasting plays an important role in supporting decision-making at various levels, from operational purchasing decisions at individual business organizations to policy-making. A significant body of literature has looked into energy price forecasting, investigating a wide range of methods to improve accuracy and inform these critical decisions. Given the evolving landscape of forecasting techniques, the literature lacks a thorough empirical comparison that systematically contrasts these methods. This paper provides an in-depth review of the evolution of forecasting modeling frameworks, from well-established econometric models to machine learning methods, early sequence learners such LSTMs, and more recent advancements in deep learning with transformer networks, which represent the cutting edge in forecasting. We offer a detailed review of the related literature and categorize forecasting methodologies into four model families. We also explore emerging concepts like pre-training and transfer learning, which have transformed the analysis of unstructured data and hold significant promise for time series forecasting. We address a gap in the literature by performing a comprehensive empirical analysis on these four family models, using data from the EU energy markets, we conduct a large-scale empirical study, which contrasts the forecasting accuracy of different approaches, focusing especially on alternative propositions for time series transformers.
- Europe (1.00)
- North America > United States (0.28)
- Overview (1.00)
- Research Report > New Finding (0.67)
- Research Report > Promising Solution (0.46)
- Energy > Renewable (1.00)
- Energy > Power Industry (1.00)
- Banking & Finance > Trading (1.00)
- Energy > Oil & Gas (0.93)